Yichu Chen, Gu Gong, Kate Jones, Gianni Spiga
2022-11-29
Sexually transmitted diseases are infections that are spread during vaginal, oral, or anal intercourse. Although sometimes undetected, STDs can cause serious health problems in individuals and lead to reproductive issues. Here, we examine two types of bacterial infections that can be easily treated once diagnosed. In order to understand the factors that influence the chance of reinfection and hopefully decrease the cases in high-risk populations, the following data was analyzed. Time to reinfection is studied for three different groups: those infected with gonorrhea, those infected with chlamydia, and those infected with both. We also analyze various predictors to see if they significantly influence the survival probability.
The predictors are as follows:
If we can understand the factors that lead to an increased risk of reinfection, we can utilize targeted preventive care and hopefully reduce the number of individuals who become infected.
Our first step is to visualize a few of the variables in our data:
Below we create two pie charts. The first shows the percentages for each type of initial infection. The second shows the percentages for the number of partners each patient had.
About 70% of the patients reported to have 1 sex partner; 16.6% of them had 2 partners; 8% had no sex partner; it is very rare to have 3 or more sex partners; still, it is worth notice that one of the patients reported to have 19 sex partners.
Another interesting observation is that about 8% of patients in the study had zero partners but still contracted an STD. We note that diseases like gonorrhea and chlamydia can be transmitted via ocular exposure or needle sharing.
We are also interested in the frequency of symptoms for each type of initial infection among the patients in the study. Overall, discharge is the most common symptom for all three types of initial infection.
## Call:
## survdiff(formula = surv_object ~ iinfct, data = std)
##
## N Observed Expected (O-E)^2/E (O-E)^2/V
## iinfct=gonorrhea 140 73 54.5 6.28042 7.50617
## iinfct=chlamydia 396 135 153.0 2.12201 3.81136
## iinfct=both 341 139 139.5 0.00166 0.00278
##
## Chisq= 8.5 on 2 degrees of freedom, p= 0.01
## Call:
## coxph(formula = surv_object ~ iinfct, data = std)
##
## n= 877, number of events= 347
##
## coef exp(coef) se(coef) z Pr(>|z|)
## iinfctchlamydia -0.4202 0.6569 0.1457 -2.884 0.00393 **
## iinfctboth -0.2980 0.7423 0.1450 -2.055 0.03984 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## iinfctchlamydia 0.6569 1.522 0.4937 0.8741
## iinfctboth 0.7423 1.347 0.5587 0.9863
##
## Concordance= 0.524 (se = 0.016 )
## Likelihood ratio test= 7.93 on 2 df, p=0.02
## Wald test = 8.37 on 2 df, p=0.02
## Score (logrank) test = 8.46 on 2 df, p=0.01
cox1 <-
coxph(
surv_object ~ iinfct + marital + race + os12m + os30d +
rs12m + rs30d + abdpain + discharge + dysuria + condom +
itch + lesion + rash + lymph + vagina + dchexam + abnode +
age + yschool + npartner,
data = std
)
summary(cox1)## Call:
## coxph(formula = surv_object ~ iinfct + marital + race + os12m +
## os30d + rs12m + rs30d + abdpain + discharge + dysuria + condom +
## itch + lesion + rash + lymph + vagina + dchexam + abnode +
## age + yschool + npartner, data = std)
##
## n= 877, number of events= 347
##
## coef exp(coef) se(coef) z Pr(>|z|)
## iinfctchlamydia -0.334628 0.715604 0.149647 -2.236 0.02534 *
## iinfctboth -0.267515 0.765279 0.149987 -1.784 0.07449 .
## maritalMarried 0.055058 1.056602 0.431303 0.128 0.89842
## maritalSingle 0.408097 1.503953 0.295341 1.382 0.16704
## raceWhite -0.111462 0.894526 0.141327 -0.789 0.43030
## os12m1 -0.206151 0.813711 0.212028 -0.972 0.33091
## os30d1 -0.339983 0.711783 0.238657 -1.425 0.15428
## rs12m1 0.033955 1.034538 0.445166 0.076 0.93920
## rs30d1 -0.194771 0.823023 0.565199 -0.345 0.73039
## abdpain1 0.229308 1.257729 0.156236 1.468 0.14219
## discharge1 0.114691 1.121527 0.114283 1.004 0.31559
## dysuria1 0.164089 1.178320 0.155157 1.058 0.29025
## condomsometime -0.064082 0.937928 0.239642 -0.267 0.78916
## condomnever -0.321968 0.724721 0.247790 -1.299 0.19382
## itch1 -0.147451 0.862905 0.154774 -0.953 0.34075
## lesion1 -0.183670 0.832210 0.333523 -0.551 0.58184
## rash1 0.008298 1.008333 0.392928 0.021 0.98315
## lymph1 -0.030840 0.969631 0.547496 -0.056 0.95508
## vagina1 0.351398 1.421053 0.174868 2.010 0.04448 *
## dchexam1 -0.463338 0.629180 0.230020 -2.014 0.04397 *
## abnode1 0.170712 1.186149 0.433788 0.394 0.69392
## age 0.008096 1.008129 0.013913 0.582 0.56066
## yschool -0.128072 0.879790 0.039366 -3.253 0.00114 **
## npartner 0.076670 1.079686 0.053888 1.423 0.15480
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## iinfctchlamydia 0.7156 1.3974 0.5337 0.9595
## iinfctboth 0.7653 1.3067 0.5704 1.0268
## maritalMarried 1.0566 0.9464 0.4537 2.4606
## maritalSingle 1.5040 0.6649 0.8430 2.6830
## raceWhite 0.8945 1.1179 0.6781 1.1800
## os12m1 0.8137 1.2289 0.5370 1.2330
## os30d1 0.7118 1.4049 0.4459 1.1363
## rs12m1 1.0345 0.9666 0.4323 2.4756
## rs30d1 0.8230 1.2150 0.2718 2.4918
## abdpain1 1.2577 0.7951 0.9260 1.7083
## discharge1 1.1215 0.8916 0.8965 1.4031
## dysuria1 1.1783 0.8487 0.8693 1.5971
## condomsometime 0.9379 1.0662 0.5864 1.5002
## condomnever 0.7247 1.3798 0.4459 1.1778
## itch1 0.8629 1.1589 0.6371 1.1687
## lesion1 0.8322 1.2016 0.4328 1.6000
## rash1 1.0083 0.9917 0.4668 2.1780
## lymph1 0.9696 1.0313 0.3316 2.8355
## vagina1 1.4211 0.7037 1.0087 2.0020
## dchexam1 0.6292 1.5894 0.4008 0.9876
## abnode1 1.1861 0.8431 0.5069 2.7758
## age 1.0081 0.9919 0.9810 1.0360
## yschool 0.8798 1.1366 0.8145 0.9504
## npartner 1.0797 0.9262 0.9715 1.2000
##
## Concordance= 0.635 (se = 0.016 )
## Likelihood ratio test= 73.37 on 24 df, p=7e-07
## Wald test = 69.89 on 24 df, p=2e-06
## Score (logrank) test = 71.71 on 24 df, p=1e-06
From here, the goal is to remove extraneous variables and include models that are statistically significant and lower the AIC of the model.
After creating a model with just these variables, condom use became less significant. However, no condom use is much more significant with a p-value of 0.14 versus a p-value of 0.80 for condom use sometimes. We create a model that drops the condom variable, but the AIC increases, suggesting that we should keep it. To see if recategorizing the values would lead to an increase in significance and decrease in AIC, we split condom use into two categories: never and use, where use includes individuals who sometimes use condoms and those who always use condoms. The new p-value for never using condoms was 0.009 and AIC decreased, so we kept the transformed variable in the model.
cox.model = coxph(surv_object ~ iinfct+condom+vagina
+dchexam+yschool, data = std)
summary(cox.model)## Call:
## coxph(formula = surv_object ~ iinfct + condom + vagina + dchexam +
## yschool, data = std)
##
## n= 877, number of events= 347
##
## coef exp(coef) se(coef) z Pr(>|z|)
## iinfctchlamydia -0.38468 0.68067 0.14618 -2.632 0.00850 **
## iinfctboth -0.24681 0.78129 0.14561 -1.695 0.09006 .
## condomnever -0.29779 0.74245 0.11551 -2.578 0.00994 **
## vagina1 0.40660 1.50171 0.16739 2.429 0.01514 *
## dchexam1 -0.37042 0.69044 0.22123 -1.674 0.09405 .
## yschool -0.14357 0.86626 0.03332 -4.308 1.64e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## iinfctchlamydia 0.6807 1.4692 0.5111 0.9065
## iinfctboth 0.7813 1.2799 0.5873 1.0393
## condomnever 0.7425 1.3469 0.5920 0.9311
## vagina1 1.5017 0.6659 1.0817 2.0848
## dchexam1 0.6904 1.4483 0.4475 1.0652
## yschool 0.8663 1.1544 0.8115 0.9247
##
## Concordance= 0.603 (se = 0.017 )
## Likelihood ratio test= 45.32 on 6 df, p=4e-08
## Wald test = 46.51 on 6 df, p=2e-08
## Score (logrank) test = 46.77 on 6 df, p=2e-08
With our reduced model, we still have an overall highly significant model supported by the Liklihood Ratio Test, Wald test, and the Score test.
In the next slide, we will use residuals to address our considerations of stratification and transformation of quantitative variables.
## chisq df p
## iinfct 3.5184 2 0.17
## condom 1.0539 1 0.30
## vagina 1.2835 1 0.26
## dchexam 0.4085 1 0.52
## yschool 0.0214 1 0.88
## GLOBAL 6.1176 6 0.41
Since our standard 45 degree line fits the curve well, we can conclude that our final model is indeed a good fit.
## Call:
## coxph(formula = surv_object ~ iinfct + condom + vagina + dchexam +
## yschool, data = std)
##
## n= 877, number of events= 347
##
## coef exp(coef) se(coef) z Pr(>|z|)
## iinfctchlamydia -0.38468 0.68067 0.14618 -2.632 0.00850 **
## iinfctboth -0.24681 0.78129 0.14561 -1.695 0.09006 .
## condomnever -0.29779 0.74245 0.11551 -2.578 0.00994 **
## vagina1 0.40660 1.50171 0.16739 2.429 0.01514 *
## dchexam1 -0.37042 0.69044 0.22123 -1.674 0.09405 .
## yschool -0.14357 0.86626 0.03332 -4.308 1.64e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## iinfctchlamydia 0.6807 1.4692 0.5111 0.9065
## iinfctboth 0.7813 1.2799 0.5873 1.0393
## condomnever 0.7425 1.3469 0.5920 0.9311
## vagina1 1.5017 0.6659 1.0817 2.0848
## dchexam1 0.6904 1.4483 0.4475 1.0652
## yschool 0.8663 1.1544 0.8115 0.9247
##
## Concordance= 0.603 (se = 0.017 )
## Likelihood ratio test= 45.32 on 6 df, p=4e-08
## Wald test = 46.51 on 6 df, p=2e-08
## Score (logrank) test = 46.77 on 6 df, p=2e-08
Although there were no obvious outliers for the condom use variable or initial infection type, the other variables had outliers that are summarized below:
Although we already went over the mathematical interpretation of our model, we want to also touch on some intuitive reasoning behind our coefficients. First, we saw earlier that each additional year of schooling leads to a further reduction in risk of reinfection. As individuals progress through the public schooling system, they are progressively taught more about sexual education and diseases. Increased awareness would intuitively lead to a decrease in risk of contraction because people become more aware of the dangers of STDs and how to prevent contracting them.
In terms of the initial type of infection, we noticed that an initial infection of gonorrhea is associated with a higher risk of reinfection. Looking further into the differences of chlamydia and gonorrhea, we found that chlamydia is more common than gonorrhea, which is supported by our data. Further, we found that some strains of gonorrhea have become antibiotic resistant, making the infection harder to treat and increasing its propensity to spread. A similar phenomenon has not been recorded for chlamydia.
The model showed that vaginal involvement considerably increased the risk of reinfection. Vaginal involvement leads to an increased susceptibility to infection since there is tissue damage. Hence, when patients experience vaginal involvement they should be made aware that they are at a higher risk of contracting an STD and hence should be especially careful, being sure to discuss STD and testing with their partners.
Interestingly, our model suggests that never using condoms actually decreases risk of reinfection. There are a few hypotheses that could be tested with further studies to explain this relationship. First, it may be that those individuals who are not using condoms are only in exclusive relationships and hence have one consistent partner. Alternatively, it may be the case that those people not using condoms are more selective with their partners and are actively being tested and/or having open discussions about STDs.
The last variable to discuss is discharge. This variable may require further exploration because we are not given any additional information in terms of the type of discharge. Vaginal discharge is actually a fairly common occurrence and does not necessarily imply that there is an issue. It would be helpful to have more information about the type of discharge and if the patient regularly has vaginal discharge.
The goal of this study was to analyze risk factors for
reinfection of gonorrhea and chlamydia in order to inform the population
and decrease the spread of STDs. From this model, we found that years of
education is the most influential factor in determining risk of
reinfection. Therefore, we suggest that sexual education be consistently
addressed earlier in school in order to raise awareness and communicate
the importance of safe sex practices and STD testing.
Lastly, we noticed in the exploratory data analysis that our population is lacking racial diversity. In further studies, it would be interesting to gather data from a more diverse population, and also gather information about income level to see if there is any correlation between income and risk of infection. Additionally, it’s our understanding that each of the individuals in this study is female. It would be interesting to conduct a similar study using a population of males to see if there’s a difference in survival curves between genders.
Klein, J.P. and Moeschberger, M.L. (2003). Survival Analysis: Techniques for Censored and Truncated Data (2nd Edition). 2003. Springer. 542 p.
Mitchell H. (2004, May 29). Vaginal discharge–causes, diagnosis, and treatment. BMJ.;328(7451):1306-8. doi: 10.1136/bmj.328.7451.1306. PMID: 15166070; PMCID: PMC420177.
What Is Gonorrhea & How Do You Get It? (n.d.). Planned Parenthood. https://www.plannedparenthood.org/learn/stds-hiv-safer-sex/gonorrhea.
Yocum, A., MD. (2022, October 22). What’s the Difference Between Chlamydia and Gonorrhea? K Health. https://khealth.com/learn/std/chlamydia-vs-gonorrhea/.